Parsing with Principles and Probabilities

نویسندگان

  • Andrew Fordham
  • Matthew W. Crocker
چکیده

This paper is an attempt to bring together two approaches to language analysis. The possible use of probabilistic information in principle-based grammars and parsers is considered, including discussion on some theoretical and computational problems that arise. Finally a partial implementation of these ideas is presented, along with some preliminary results from testing on a small set of sentences. I n t r o d u c t i o n Both principle-based parsing and probabilistic methods for the analysis of natural language have become popular in the last decade. While the former borrows from advanced linguistic specifications of syntax, the latter has been more concerned with extracting distributional regularities from language to aid the implementation of NLP systems and the analysis of corpora. These symbolic and statistical approaches axe beginning to draw together as it becomes clear that one cannot exist entirely without the other: the knowledge of language posited over the years by theoretical linguists has been useful in constraining and guiding statistical approaches, and the corpora now available to linguists have resurrected the desire to account for real language data in a more principled way than had previously been attempted. This paper falls directly between these approaches, using statistical information derived from corpora analysis to weight syntactic analyses produced by a 'principles and parameters' parser. The use of probabilistic information in principle-based grammars and parsers is considered, including discussion on some theoretical and computational problems that arise. Finally a paxtial implementation of these ideas is presented, along with some preliminary results from testing on a small set of sentences. G o v e r n m e n t . B i n d i n g T h e o r y The principles and paxameters paradigm in linguistics is most fully realised in the Government-Binding Theory (GB) of Chomsky [Chomsky1981, Chomsky19861 and others. The grammar is divided into modules which M a t t h e w C r o c k e r Centre for Cognitive Science University of Edinburgh

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing with Principles and Probabilities * Cmp-lg/9408004

This paper is an attempt to bring together two approaches to language analysis. The possible use of probabilistic information in principle-based grammars and parsers is considered, including discussion on some theoretical and computational problems that arise. Finally a partial implementation of these ideas is presented, along with some preliminary results from testing on a small set of sentences.

متن کامل

A Probabilistic Disambiguation Method Based on Psycholinguistic Principles

We address the problem of structural disambiguation in syntactic parsing. In psycholinguistics, a number of principles of disambiguation have been proposed, notably the Lexical Preference Rule (LPR), the Right Association Principle (RAP), and the Attach Low and Parallel Principle (ALPP). We argue that in order to improve disambiguation results it is necessary to implement these principles on th...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Parsing Inside-Out

Probabilistic Context-Free Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given non-terminal covers any piece of the input sentence. The traditional use of the...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cmp-lg/9408004  شماره 

صفحات  -

تاریخ انتشار 1994